Multiple Pattern Matching in LZW Compressed Text
نویسندگان
چکیده
In this paper we address the problem of searching in LZW compressed text directly, and present a new algorithm for finding multiple patterns by simulating the move of the Aho-Corasick pattern matching machine. The new algorithm finds all occurrences of multiple patterns whereas the algorithm proposed by Amir, Benson, and Farach finds only the first occurrence of a single pattern. The new algorithm runs in O(n+m+ r) time using O(n+m) space, where n is the length of the compressed text, m is the length of the total length of the patterns, and r is the number of occurrences of the patterns. We implemented a simple version of the algorithm, and showed that it is approximately twice faster than a decompression followed by a search using the Aho-Corasick machine.
منابع مشابه
Shift-And Approach to Pattern Matching in LZW Compressed Text
This paper considers the Shift-And approach to the problem of pattern matching in LZW compressed text, and gives a new algorithm that solves it. The algorithm is indeed fast when a pattern length is at most 32, or the word length. After an O(m + |Σ|) time and O(|Σ|) space preprocessing of a pattern, it scans an LZW compressed text in O(n + r) time and reports all occurrences of the pattern, whe...
متن کاملTying up the loose ends in fully LZW-compressed pattern matching
We consider a natural generalization of the classical pattern matching problem: given compressed representations of a pattern p[1. . M ] and a text t[1. . N ] of sizes m and n, respectively, does p occur in t? We develop an optimal linear time solution for the case when p and t are compressed using the LZW method. This improves the previously known O((n + m) log(n + m)) time solution of G asien...
متن کاملA Unifying Framework for Compressed Pattern Matching
We introduce a general framework which is suitable to capture an essence of compressed pattern matching according to various dictionary based compressions. The goal is to find all occurrences of a pattern in a text without decompression, which is one of the most active topics in string matching. Our framework includes such compression methods as Lempel-Ziv family, (LZ77, LZSS, LZ78, LZW), byte-...
متن کاملAlmost Optimal Fully LZW-Compressed Pattern Matching
Given two strings: pattern P and text T of lengths jPj =M and jT j = N . A string matching problem is to nd all occurrences of pattern P in text T . A fully compressed string matching problem is the string matching problem with input strings P and T given in compressed forms p and t respectively, where jpj = m and jtj = n. We present rst, almost optimal, string matching algorithms for LZW-compr...
متن کاملBeating O(nm) in approximate LZW-compressed pattern matching
Given an LZW/LZ78 compressed text, we want to find an approximate occurrence of a given pattern of length m. The goal is to achieve time complexity depending on the size n of the compressed representation of the text instead of its length. We consider two specific definitions of approximate matching, namely the Hamming distance and the edit distance, and show how to achieve O(n √ mk) and O(n √ ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998